Scalable Name Disambiguation using Multi-level Graph Partition

نویسندگان

  • Byung-Won On
  • Dongwon Lee
چکیده

When non-unique values are used as the identifier of entities, due to their homonym, confusion can occur. In particular, when (part of) “names” of entities are used as their identifier, the problem is often referred to as the name disambiguation problem, where goal is to sort out the erroneous entities due to name homonyms (e.g., if only last name is used as the identifier, one cannot distinguish “Vannevar Bush” from “George Bush”). In this paper, in particular, we study the scalability issue of the name disambiguation problem – when (1) a small number of entities with large contents or (2) a large number of entities get un-distinguishable due to homonyms, how to resolve it? We first carefully examine two of the state-of-the-art solutions to the name disambiguation problem, and point out their limitations with respect to scalability. Then, we adapt the multi-level graph partition technique to solve the large-scale name disambiguation problem. Our claim is empirically validated via experimentation – our proposal shows orders of magnitude improvement in terms of performance while maintaining equivalent or reasonable accuracy compared to competing solutions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Teleportation Cost in Multi-Partition Distributed Quantum Circuits

There are many obstacles in quantum circuits implementation with large scales, so distributed quantum systems are appropriate solution for these quantum circuits. Therefore, reducing the number of quantum teleportation leads to improve the cost of implementing a quantum circuit. The minimum number of teleportations can be considered as a measure of the efficiency of distributed quantum systems....

متن کامل

Name Disambiguation from link data in a collaboration graph

In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and ...

متن کامل

Improvement of the E ciency of Genetic Algorithms for Scalable Parallel Graph Partitioning in a Multi-Level Framework

Parallel graph partitioning is a di cult issue, because the best sequential graph partitioning methods known to date are based on iterative local optimization algorithms that do not parallelize nor scale well. On the other hand, evolutionary algorithms are highly parallel and scalable, but converge very slowly as problem size increases. This paper presents methods that can be used to reduce pro...

متن کامل

Improvement of the Efficiency of Genetic Algorithms for Scalable Parallel Graph Partitioning in a Multi-level Framework

Parallel graph partitioning is a difficult issue, because the best sequential graph partitioning methods known to date are based on iterative local optimization algorithms that do not parallelize nor scale well. On the other hand, evolutionary algorithms are highly parallel and scalable, but converge very slowly as problem size increases. This paper presents methods that can be used to reduce p...

متن کامل

Name List Only? Target Entity Disambiguation in Short Texts

Target entity disambiguation (TED), the task of identifying target entities of the same domain, has been recognized as a critical step in various important applications. In this paper, we propose a graphbased model called TremenRank to collectively identify target entities in short texts given a name list only. TremenRank propagates trust within the graph, allowing for an arbitrary number of ta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007